Search CORE

33 research outputs found

NGS data analysis: a review of major tools and pipeline frameworks for variant discovery

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Publication venue
Publication date: 01/01/2022
Field of study

[EN]The analysis of genetic data has always been a problem due to the large amount of information available and the difficulty in isolating that which is relevant. However, over the years progress in sequencing techniques has been accompanied by a development of computer techniques to the current application of artificial intelligence. We can summarize the phases of sequence analysis in the following: quality assessment, alignment, pre-variant processing, variant calling and variant annotation. In this article we will review and comment on the tools used in each phase of genetic sequencing, and analyze the drawbacks and advantages offered by each of them

Gestion del Repositorio Documental de la Universidad de Salamanca

Revolutionizing Pharmaceuticals: Generative Artificial Intelligence as a bibliographic assistant

Author: Canal-Alonso Ángel
Corchado Rodriguez Juan Manuel
Egido Noelia
Jiménez Pedro
Publication venue
Publication date: 01/01/2021
Field of study

[EN]Artificial Generative Intelligence (AGI) has exploded into biomedical and pharmaceutical research, fundamentally transforming the way scientists approach literature review, experiment design, and reagent and antibody selection. This article explores how IAG, supported by advanced machine learning and natural language processing models, has revolutionized these processes. The IAG streamlines literature review, extracting relevant information, identifying emerging patterns and trends in the scientific literature, and generating innovative hypotheses. It also acts as an advanced search tool, allowing researchers to quickly access accurate information in an ocean of data. A prominent example of this application is BenchSci, a platform that uses the IAG to recommend reagents and antibodies based on real experimental data and scientific literature. This integration of IAG into experimental design promises to accelerate research, reduce costs, and improve the precision of experiments. Together, the IAG is presented as a catalyst for discoveries in pharmaceutical and biomedical research, offering unprecedented potential to advance the understanding and treatment of diseases, and improve decision-making in the industry

Gestion del Repositorio Documental de la Universidad de Salamanca

Smart Buildings IoT Networks Accuracy Evolution Prediction to Improve Their Reliability Using a Lotka–Volterra Ecosystem Model

Author: Canal-Alonso Ángel
Casado Vara Roberto
Martín del Rey Ángel
Prieta Fernando de la
Prieto Javier
Publication venue: 'MDPI AG'
Publication date: 01/10/2019
Field of study

Internet of Things (IoT) is the paradigm that has largely contributed to the development of smart buildings in our society. This technology makes it possible to monitor all aspects of the smart building and to improve its operation. One of the main challenges encountered by IoT networks is that the the data they collect may be unreliable since IoT devices can lose accuracy for several reasons (sensor wear, sensor aging, poorly constructed buildings, etc.). The aim of our work is to study the evolution of IoT networks over time in smart buildings. The hypothesis we have tested is that, by amplifying the Lotka–Volterra equations as a community of living organisms (an ecosystem model), the reliability of the system and its components can be predicted. This model comprises a set of differential equations that describe the relationship between an IoT network and multiple IoT devices. Based on the Lotka–Volterra model, in this article, we propose a model in which the predators are the non-precision IoT devices and the prey are the precision IoT devices. Furthermore, a third species is introduced, the maintenance staff, which will impact the interaction between both species, helping the prey to survive within the ecosystem. This is the first Lotka–Volterra model that is applied in the field of IoT. Our work establishes a proof of concept in the field and opens a wide spectrum of applications for biology models to be applied in IoT.This paper has been partially supported by the Salamanca Ciudad de Cultura y Saberes Foundation under the Talent Attraction Program (CHROMOSOME project)

Repositorio Institucional de la Universidad de Burgos

File formats used in next generation sequencing: A literature review

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2022
Field of study

[EN]Next-generation sequencing (NGS) has revolutionized the field of genomics, allowing a detailed and precise look at DNA. As this technology advanced, the need arose for standardized file formats to represent, analyze and store the vast data sets produced. In this article, we review the key file formats used in NGS: FASTA, FASTQ, BED, GFF, and VCF. The FASTA format, one of the oldest, provides a basic representation of genomic and protein sequences, identifiable by unique headers. FASTQ is essential for NGS, as it stores both the sequence and the associated quality information. BED provides a tabular representation of genomic loci, while GFF details the localization and structure of genomic features in reference sequences. Finally, VCF has emerged as the predominant standard for documenting genetic variants, from simple SNPs to complex structural variants. The adoption and adaptation of these formats have been fundamental for progress in bioinformatics and genomics. They provide a foundation on which to build sophisticated analyses, from gene discovery and function prediction to the identification of disease-associated variants. With a clear understanding of these formats, researchers and practitioners are better equipped to harness the power and potential of next-generation sequencing.This study has been funded by the AIR Genomics project (with file number CCTT3/20/SA/0003), through the call 2020 R&D PROJECTS ORIENTED TO THE EXCELLENCE AND COMPETITIVE IMPROVEMENT OF THE CCTT by the Institute of Business Competitiveness of Castilla y León and FEDER fun

Gestion del Repositorio Documental de la Universidad de Salamanca

Application of Deep Symbolic Learning in NGS

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2021
Field of study

[EN]The application of Deep Symbolic Learning in genomic analysis has begun to gain traction as a promising approach to interpret and understand vast data sets derived from DNA sequencing. Next-generation sequencing (NGS) techniques have revolutionized the field of clinical genetics and human biology, generating massive volumes of data that require advanced tools for analysis. However, traditional methods are often too abstract or complicated for clinical staff. This work focuses on exploring how Deep Symbolic Learning, a subfield of explainable artificial intelligence (XAI), can be effectively applied to NGS data. A detailed evaluation of the suitability of different architectures will be carried out

Gestion del Repositorio Documental de la Universidad de Salamanca

Integrating Nextflow and AWS for Large-Scale Genomic Analysis: A Hypothetical Case Study

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2022
Field of study

[EN]This article explores the innovative combination of Nextflow and Amazon Web Services (AWS) to address the challenges inherent in large-scale genomic analysis. Focusing on a hypothetical case called "The Pacific Genome Atlas", it illustrates how a research organization could approach the sequencing and analysis of 10,000 genomes. Although the "Pacific Genome Atlas" is a fictional example used for illustrative purposes only, it highlights the real challenges associated with large genomic projects, such as handling huge volumes of data and the need for intensive computational analysis. Through the integration of Nextflow, a workflow management tool, with the AWS cloud infrastructure, we demonstrate how these challenges can be overcome, offering scalable, flexible and cost-effective solutions for genomic research. The adoption of modern technologies, such as those described in this article, is essential to advance the field of genomics and accelerate scientific discoveries.The present study has been funded by the AIR Genomics project (file number CCTT3/20/SA/0003) through the 2020 call for R&D Projects Oriented towards Excellence and Competitive Improvement of CCTT by the Institute of Business Competitiveness of Castilla y León and FEDER fund

Gestion del Repositorio Documental de la Universidad de Salamanca

Review of state-of-the-art algorithms for genomics data analysis pipelines

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2022
Field of study

[EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in terms of data processing for clinical use. The complexity of detecting and interpreting genetic variants, coupled with the vast array of tools and algorithms and the heavy computational workload, has made the development of comprehensive genomic analysis platforms crucial to enabling clinicians to quickly provide patients with genetic results. This chapter reviews and describes the pipeline for analyzing massive genomic data using both short-read and long-read technologies, discussing the current state of the main tools used at each stage and the role of artificial intelligence in their development. It also introduces DeepNGS (deepngs.eu), an end-to-end genomic analysis web platform, including its key features and applications

Gestion del Repositorio Documental de la Universidad de Salamanca

Evaluation of points of improvement in NGS data analysis

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2021
Field of study

[EN]DNA sequencing is a fundamental technique in molecular biology that allows the exact sequence of nucleotides in a DNA sample to be read. Over the past decades, DNA sequencing has seen significant advances, evolving from manual and laborious techniques to modern high-throughput techniques. Despite these advances, interpretation and analysis of sequencing data continue to present challenges. Artificial Intelligence (AI), and in particular machine learning, has emerged as an essential tool to address these challenges. The application of AI in the sequencing pipeline refers to the use of algorithms and models to automate, optimize and improve the precision of the sequencing process and its subsequent analysis. The Sanger sequencing method, introduced in the 1970s, was one of the first to be widely used. Although effective, this method is slow and is not suitable for sequencing large amounts of DNA, such as entire genomes. With the arrival of next generation sequencing (NGS) in the 21st century, greater speed and efficiency in obtaining genomic data has been achieved. However, the exponential increase in the amount of data produced has created a bottleneck in its analysis and interpretation

Gestion del Repositorio Documental de la Universidad de Salamanca

Application of hybrid algorithms and Explainable Artificial Intelligence ingenomic sequencing

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2022
Field of study

[EN]DNA sequencing is one of the fields that has advanced the most in recent years within clinical genetics and human biology. However, the large amount of data generated through next generation sequencing (NGS) techniques requires advanced data analysis processes that are sometimes complex and beyond the capabilities of clinical staff. Therefore, this work aims to shed light on the possibilities of applying hybrid algorithms and explainable artificial intelligence (XAI) to data obtained through NGS. The suitability of each architecture will be evaluated phase by phase in order to offer final recommendations that allow implementation in clinical sequencing workflow

Gestion del Repositorio Documental de la Universidad de Salamanca

Deep Symbolic Learning Architecture for Variant Calling in NGS

Author: Canal-Alonso Ángel
Corchado Rodríguez Juan Manuel
Egido Noelia
Jiménez Pedro
Prieto Tejedor Javier
Publication venue
Publication date: 01/01/2022
Field of study

[EN]The Variant Detection process (Variant Calling) is fundamental in bioinformatics, demanding maximum precision and reliability. This study examines an innovative integration strategy between a traditional pipeline developed in-house and an advanced Intelligent System (IS). Although the original pipeline already had tools based on traditional algorithms, it had limitations, particularly in the detection of rare or unknown variants. Therefore, SI was introduced with the aim of providing an additional layer of analysis, capitalizing on deep and symbolic learning techniques to improve and enhance previous detections. The main technical challenge lay in interoperability. To overcome this, NextFlow, a scripting language designed to manage complex bioinformatics workflows, was employed. Through NextFlow, communication and efficient data transfer between the original pipeline and the SI were facilitated, thus guaranteeing compatibility and reproducibility. After the Variant Calling process of the original system, the results were transmitted to the SI, where a meticulous sequence of analysis was implemented, from preprocessing to data fusion. As a result, an optimized set of variants was generated that was integrated with previous results. Variants corroborated by both tools were considered to be of high reliability, while discrepancies indicated areas for detailed investigations. The product of this integration advanced to subsequent stages of the pipeline, usually annotation or interpretation, contextualizing the variants from biological and clinical perspectives. This adaptation not only maintained the original functionalities of the pipeline, but was also enhanced with the SI, establishing a new standard in the Variant Calling process. This research offers a robust and efficient model for the detection and analysis of genomic variants, highlighting the promise and applicability of blended learning in bioinformaticsThis study has been funded by the AIR Genomics project (with file number CCTT3/20/SA/0003), through the call 2020 R&D PROJECTS ORIENTED TO THE EXCELLENCE AND COMPETITIVE IMPROVEMENT OF THE CCTT by the Institute of Business Competitiveness of Castilla y León and FEDER fund

Gestion del Repositorio Documental de la Universidad de Salamanca